Hack a Neural Network in just 10 Lines of Code!!!

您所在的位置:网站首页 deviation diff Hack a Neural Network in just 10 Lines of Code!!!

Hack a Neural Network in just 10 Lines of Code!!!

#Hack a Neural Network in just 10 Lines of Code!!! | 来源: 网络整理| 查看: 265

Hack a Neural Network in just 10 Lines of Code!!!Photo by Jefferson Santos on Unsplash

Hello everyone! Hope you are doing well. Let’s dive right in.

What is Hacking a Neural Network?

Hacking a Neural Network is simply fooling a Neural Network. Neural Networks are increasingly being used in various security and moderating systems across different fields. It is very important that they maintain their integrity across different types of attacks. In this article, I am going to explain how we can modify an image (without changing it too much) to force the Neural Network to mis-classify it (that too with a very high degree of certainty).

Below given is an excerpt from Ian Goodfellow’s paper, EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES.

In the above example, we can see that adding carefully and mathematically designed noise to an image can throw the neural network’s predictions off by huge amount, whilst managing to keep the image visually unchanged.

In this article, I am going to hack an MNIST digit classifier Neural Network. Let us take an image of digit 0 (as given below). Since the model I have built has an accuracy of about 96%, It is going to classify this image correctly as 0. Now, I want to modify this image of 0 by adding some strategic noise (this is arrived upon by simple gradient descent) to it, so that MNIST model classifies it as image 8 AND the image is not modified too much. Hence, to us humans, we can still clearly see the image is of number 0, yet model thinks the image consists of number 8 with very high probability.

Here is what the results look like:

The above image is recognized by the MNIST classifier as number 0 with a certainty of 99%. Below given is the modified image that is recognized as number 8 by the NN with a certainty of 99.6%.

In this article, I am going to explain how I was able to modify the first image into 2nd image, i.e., how I fooled the neural network to classify an image that is so obviously 0 as 8 (with a very high degree of certainty).

Procedure for Hacking an MNIST Classifier:

The model to be hacked is trained based on the MNIST classifier found in this article. Now that we have the trained model ready to get hacked, let us get started with hacking.

We are going to use simple gradient descent to calculate the exact noise that has to be added to the original image so as to throw the NN off. First, I am going to make a loss function for the gradient descent to follow. Loss function is one of the most important parts of any system involving gradient descent. Gradient descent is simply a way to minimize the loss function. Hence, we need to make a loss function that reflects our goals, i.e. make predicted probability of 8 for image maximum and keep the difference between original and modified image to a minimum. And here is what the loss function looks like.

def findloss(diff,pred): l1=torch.mean(torch.square(diff)) l2=pred**(-1)

fl=(l1+l2)**0.5

return fl

l1 represents the deviation (diff or noise) between modified and original images. l2 represents the inverse of probability the NN classifies the modified image as digit 8. Thus minimizing square root of sum of l1 and l2, leads to minimizing both l1 and l2 thus minimizing noise and maximizing the probability modified image is classified by NN as 8 simultaneously.

Now, let us use gradient descent to modify the noise so as to minimize the above defined loss function (this automatically achieves the goals we had set to hack the Neural Network). Below is the code for that.

learning_rate=0.0005num_descents=2000000 #number of times gradient descent is employed

img=img.cuda()#generate random noise to finetune using gradient descentdiff=torch.rand(784).cuda().requires_grad_()

for i in range(num_descents):

#get the modified image imagef=img+diff

pred=torch.exp(model(imagef.reshape([1,784])))[0][8]

totalloss=findloss(diff,pred) if i%10000 ==0: print('Loss and prediction by the model after '+str(i)+' steps of gradient descent are '+str(totalloss.item()),str(pred.item()))

#find gradients of totalt wrt yarray. totalloss.backward()

gradients=diff.grad #torch.clip(gradients,max=100.0) with torch.no_grad(): diff[1:]=diff[1:]-learning_rate*gradients[1:] diff.grad.data.zero_()

The complete code can be found in my repo. The results achieved by this method have already been written above.

So, this is how simple gradient descent was able to fool the MNIST classifier.

Thank You for reading through the article. Here is another article that shows how powerful simple gradient descent can be.

Neural Networks Vs Simple Gradient descent: The Age old Brachistochrone Problem.Introduction:

medium.com

If you liked the article, let’s connect.

Linkedin, Twitter, Github



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3